Search CORE

202 research outputs found

Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics

Author: Farhadi Ali
Mottaghi Roozbeh
Weihs Luca
Zeng Kuo-Hao
Publication venue
Publication date: 24/04/2023
Field of study

A common assumption when training embodied agents is that the impact of taking an action is stable; for instance, executing the "move ahead" action will always move the agent forward by a fixed distance, perhaps with some small amount of actuator-induced noise. This assumption is limiting; an agent may encounter settings that dramatically alter the impact of actions: a move ahead action on a wet floor may send the agent twice as far as it expects and using the same action with a broken wheel might transform the expected translation into a rotation. Instead of relying that the impact of an action stably reflects its pre-defined semantic meaning, we propose to model the impact of actions on-the-fly using latent embeddings. By combining these latent action embeddings with a novel, transformer-based, policy head, we design an Action Adaptive Policy (AAP). We evaluate our AAP on two challenging visual navigation tasks in the AI2-THOR and Habitat environments and show that our AAP is highly performant even when faced, at inference-time with missing actions and, previously unseen, perturbed action space. Moreover, we observe significant improvement in robustness against these actions when evaluating in real-world scenarios.Comment: 21 pages, 17 figures, ICLR 202

arXiv.org e-Print Archive

The impact of giant jellyfish Nemopilema nomurai blooms on plankton communities in a temperate marginal sea

Author: Bangqin Huang
Chaolun Li
Fang Zhang
Hao Wei
Kuo-Ping Chiang
Qingzhen Yao
Tiezhu Mi
Wupeng Xiao
Xin Liu
Xuguang Huang
Yang Zeng
Publication venue: 'Elsevier BV'
Publication date: 07/12/2019
Field of study

Abstract(#br)This study focused on the bloom-developing process of the giant jellyfish, Nemopilema nomurai , on phytoplankton and microzooplankton communities. Two repeated field observations on the jellyfish bloom were conducted in June 2012 and 2014 in the southern Yellow Sea where blooms of N . nomurai were frequently observed. We demonstrated that the bloom was made up of two stages, namely the developing stage and the mature stage. Total chlorophyll a increased and the concentrations of inorganic nutrients decreased during the developing stage, while both concentrations maintained stable and at low levels during the mature stage. Our analysis revealed that phosphate excreted by growing N . nomurai promoted the growth of phytoplankton at the developing stage. At the mature stage, size compositions of microzooplankton were altered and tended to be smaller via a top-down process, while phytoplankton compositions, affected mainly through a bottom-up process, shifted to be less diatoms and cryptophytes but more dinoflagellates

Xiamen University Institutional Repository

Poet: Product-oriented Video Captioner for E-commerce

Author: Banerjee Satanjeev
Das Pradipto
David
Kipf Thomas N
Kipf Thomas N
Lin Chin-Yew
Liu Jingyuan
Liu Ziwei
Lu Jiasen
Papineni Kishore
Regneri Michaela
Sigurdsson Gunnar A.
Speer Robyn
Wang Bairui
Weston Jason
Whitehead Spencer
Yao Li
Zeng Kuo-Hao
Zhang Junchao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 16/08/2020
Field of study

In e-commerce, a growing number of user-generated videos are used for product promotion. How to generate video descriptions that narrate the user-preferred product characteristics depicted in the video is vital for successful promoting. Traditional video captioning methods, which focus on routinely describing what exists and happens in a video, are not amenable for product-oriented video captioning. To address this problem, we propose a product-oriented video captioner framework, abbreviated as Poet. Poet firstly represents the videos as product-oriented spatial-temporal graphs. Then, based on the aspects of the video-associated product, we perform knowledge-enhanced spatial-temporal inference on those graphs for capturing the dynamic change of fine-grained product-part characteristics. The knowledge leveraging module in Poet differs from the traditional design by performing knowledge filtering and dynamic memory modeling. We show that Poet achieves consistent performance improvement over previous methods concerning generation quality, product aspects capturing, and lexical diversity. Experiments are performed on two product-oriented video captioning datasets, buyer-generated fashion video dataset (BFVD) and fan-generated fashion video dataset (FFVD), collected from Mobile Taobao. We will release the desensitized datasets to promote further investigations on both video captioning and general video analysis problems.Comment: 10 pages, 3 figures, to appear in ACM MM 2020 proceeding

arXiv.org e-Print Archive

Crossref

Uncertainty-based Traffic Accident Anticipation with Spatio-Temporal Relational Learning

Author: Blundell Charles
Bradbury James
Cai Zhaowei
Chan Fu-Hsiang
Chung Junyoung
Corcoran G.
Defferrard Michaël
Fang J.
Gal Yarin
Geiger Andreas
Graves Alex
Hajiramezanali Ehsan
John
Kendall Alex
Kingma Diederik P
Kipf Thomas N
Kipf Thomas N
Lin Tsung-Yi
Ma Shugao
Paszke Adam
Ren Shaoqing
Rezende Danilo Jimenez
Suzuki Tomoyuki
Vaswani Ashish
Xie Saining
Yao Yu
Yu Fisher
Zeng Kuo-Hao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/08/2020
Field of study

Traffic accident anticipation aims to predict accidents from dashcam videos as early as possible, which is critical to safety-guaranteed self-driving systems. With cluttered traffic scenes and limited visual cues, it is of great challenge to predict how long there will be an accident from early observed frames. Most existing approaches are developed to learn features of accident-relevant agents for accident anticipation, while ignoring the features of their spatial and temporal relations. Besides, current deterministic deep neural networks could be overconfident in false predictions, leading to high risk of traffic accidents caused by self-driving systems. In this paper, we propose an uncertainty-based accident anticipation model with spatio-temporal relational learning. It sequentially predicts the probability of traffic accident occurrence with dashcam videos. Specifically, we propose to take advantage of graph convolution and recurrent networks for relational feature learning, and leverage Bayesian neural networks to address the intrinsic variability of latent relational representations. The derived uncertainty-based ranking loss is found to significantly boost model performance by improving the quality of relational features. In addition, we collect a new Car Crash Dataset (CCD) for traffic accident anticipation which contains environmental attributes and accident reasons annotations. Experimental results on both public and the newly-compiled datasets show state-of-the-art performance of our model. Our code and CCD dataset are available at https://github.com/Cogito2012/UString.Comment: Accepted by ACM MM 202

arXiv.org e-Print Archive

Crossref